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Abstract 

Pccr-to-pccr (P2P) overlay networks such as BitTorrent and Avalanche are increas- 
ingly used for disseminating potentially large files from a server to many end users via 
the Internet. The key idea is to divide the file into many equally-sized parts and then 
let users download each part (or, for network coding based systems such as Avalanche, 
linear combinations of the parts) either from the server or from another user who has 
already downloaded it. However, their performance evaluation has typically been lim- 
ited to comparing one system relative to another and typically been realized by means of 
simulation and measurements. In contrast, we provide an analytic performance analysis 
that is based on a new uplink-sharing version of the well-known broadcasting problem. 
Assuming equal upload capacities, we show that the minimal time to disseminate the file 
is the same as for the simultaneous send/receive version of the broadcasting problem. For 
general upload capacities, we provide a mixed integer linear program (MILP) solution and 
a complementary fluid limit solution. We thus provide a lower bound which can be used 
as a performance benchmark for any P2P file dissemination system. We also investigate 
the performance of a decentralized strategy, providing evidence that the performance of 
necessarily decentralized P2P file dissemination systems should be close to this bound 
and therefore that it is useful in practice. 

1 Introduction 

Suppose that M messages of equal length are initiahy known only at a single source node 
in a network. The so-called broadcasting problem is about disseminating these M messages 
to a population of N other nodes in the least possible time, subject to capacity constraints 
along the links of the network. The assumption is that once a node has received one of the 
messages it can participate subsequently in sending that message to its neighbouring nodes. 

1.1 P2P file dissemination background and related work 

In recent years, overlay networks have proven a popular way of disseminating potentially 
large files (such as a new software product or a video) from a single server S" to a potentially 
large group of N end users via the Internet. A number of algorithms and protocols have 
been suggested, implemented and studied. In particular, much attention has been given to 
peer-to-peer (P2P) systems such as BitTorrent [HI, Slurpie [HHI, SplitStream 0, Bullet' [20] 
and Avalanche ^2]j to name but a few. The key idea is that the file is divided into M parts 
of equal size and that a given user may download any one of these (or, for network coding 
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based systems such as Avalanche, hnear combinations of these) either from the server or from 
a peer who has previously downloaded it. That is, the end users collaborate by forming a 
P2P network of peers, so they can download from one another as well as from the server. 
Our motivation for revisiting the broadcasting problem is the performance analysis of such 
systems. 

With the Bit Torrent protocol^, for example, when the load on the server is heavy, the 
protocol delegates most of the uploading burden to the users who have already downloaded 
parts of the file, and who can start uploading those parts to their peers. File parts are typically 
1/4 megabyte (MB) in size. An application helps downloading peers to find each other by 
supplying lists of contact information about randomly selected peers also downloading the 
file. Peers use this information to connect to a number of neighbours. A full description can 
be found in [H]. The Bit Torrent protocol has been implemented successfully and is deployed 
widely. A detailed measurement study of the Bit Torrent system is reported in [30]. According 
to |2n]) BitTorrent's share of the total P2P traffic has reached 53% in June 2004. For recent 
measurements of the total P2P traffic on Internet backbones see |19j . 

Slurpie is a very similar protocol, although, unlike BitTorrent, it does not fix the 
number of neighbours and it adapts to varying bandwidth conditions. Other P2P overlay 
networks have also been proposed. For example see SplitStream jjj and Bullet' [20] . 

More recently. Avalanche^ |12j . a scheme based on network coding has been suggested. 
Here, users download linear combinations of file parts rather than individual file parts. This 
ensures that users do not need to find specific parts in the system, but that any upload by a 
given user can be of interest to any peer. Thus, network coding can improve performance in 
a decentralized scenario. Our results apply to any P2P file dissemination system, whether or 
not it uses network coding. 

Performance analysis of P2P systems for file dissemination has typically been limited to 
comparing one system relative to another and typically been realized by means of simulation 
and measurements. We give the makespan, that is the minimal time to fully disseminate the 
file, of M parts from a server to end users in a centralized scenario. We thereby provide a 
lower bound which can be used as a performance benchmark for any P2P file dissemination 
system. We also investigate the part of the loss in efficiency that is due to the lack of 
centralized control. Using a theoretical analysis, simulation as well as direct computation, we 
show that even a naive randomized strategy disseminates the file in an expected time that 
grows with in a similar manner to the minimal time achieved with a centralized controller. 
This suggests that the performance of necessarily decentralized P2P file dissemination systems 
should still be close to our performance bound so that it is useful in practice. 

In this paper, we provide the scheduling background, proofs and discussion of the results 
in our extended abstracts and [2^1 ■ It is essentially Chapter 2 of [^S] , but we have added 
Theorem 13 and the part on theoretical bounds in Sectional In [HEl the authors also consider 
problems concerned with the service capacity of P2P networks, however, they only give a 
heuristic argument for the makespan with equal upload capacities when A^ is of the simple 
form 2" — 1. In [HJ a fluid model for Bit Torrent-like networks is introduced and studied, 
also looking at the effect of incentive mechanisms to address free-riding. Link utilization and 
fairness are issues in j^j . In |22] , also motivated by the BitTorrent protocol and file swarming 
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systems in general, the authors consider a probabihstic model of coupon replication systems. 
Multi-torrent systems are discussed in There is other related work in j32j . 

1.2 Scheduling background and related work 

The broadcasting problem has been considered for different network topologies. Comprehen- 
sive surveys can be found in jl5j and ^Hl- On a complete graph, the problem was first solved 
in ISl and JHI- Their communication model was a unidirectional telephone model in which 
each node can either send or receive one message during each round, but cannot do both. In 
this model, the minimal number of rounds required is 2M — 1 + [log2 {N + 1)J for even A'^, 
and 2M -f [log, (N + 1)J - [^^lig^^J for odd N.^ 

In P], the authors considered the bidirectional telephone model in which nodes can both 
send one message and receive one message simultaneously, but they must be matched pairwise. 
That is, in each given round, a node can only receive a message from the same node to which 
it sends a message. They provide an optimal algorithm for odd N, which takes M + [log, N\ 
rounds. For even their algorithm is optimal up to an additive term of 3, taking M + 
[log, N\ + M/N + 2 rounds. 

The simultaneous send / receive model |23 supposes that during each round every user may 
receive one message and send one message. Unlike the telephone model, it is not required 
that a user can send a message only to the same user from which it receives a message. The 
optimal number of rounds turns out to be M -|- [log2 iVj and we will return to this result in 
Section 01 

In this paper, we are working with our new uplink-sharing model designed for P2P file 
dissemination (cf. Section EJ. It is closely related to the simultaneous send/receive model, 
but is set in continuous time. Moreover, we permit users to have different upload capacities 
which are the constraints on the data that can be sent per unit of time. This contrasts with 
previous work in which the aim was to model interactions of processors and so it was natural 
to assume that all nodes have equal capacities. Our work also differs from previous work in 
that we are motivated by the evaluation of necessarily decentralized P2P file dissemination 
algorithms, i.e., ones that can be implemented by the users themselves, rather than by a 
centralized controller. Our interest in the centralized case is as a basis for comparison and to 
give a lower bound. We show that in the case of equal upload capacities the optimal number 
of rounds is M -|- [log2 N\ as for the simultaneous send/receive model. Moreover, we provide 
two complementary solutions for the case of general upload capacities and investigate the 
performance of a decentralized strategy. 

1.3 Outlook 

The rest of this paper is organized as follows. In Section [^J we introduce the uplink-sharing 
model and relate it to the simultaneous send/receive model. Our optimal algorithm for the 
simultaneous send/receive broadcasting problem is presented in Section IHl We show that it 
also solves the problem for the uplink-sharing model with equal capacities. In Section |3] we 
show that the general uplink-sharing model can be solved via a finite number of mixed integer 
linear programming (MILP) problems. This approach is suitable for a small number of file 
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parts M. We provide additional insight through the solution of some special cases. We then 
consider the limiting case that the file can be divided into infinitely many parts and provide the 
centralized fluid solution. We extend these results to the even more general situation where 
different users might have different (disjoint) files of different sizes to disseminate (Section 12)). 
This approach is suitable for typical and for large numbers of file parts M. Finally, we turn 
to decentralized algorithms. In Section |H1 we evaluate the performance of a very simple and 
natural randomized strategy, theoretically, by simulation and by direct computation. We 
provide results in two different information scenarios with equal capacities showing that even 
this naive algorithm disseminates the file in an expected time whose growth rate with N is 
similar to the growth rate of the minimal time that we have found for a centralized controller. 
This suggests that the performance of necessarily decentralized P2P file dissemination systems 
should still be close to the performance bounds of the previous sections so that they are useful 
in practice. We conclude and present ideas for further research in Section [Tj 

2 The Uplink- Sharing Model 

We now introduce an abstract model for the file dissemination scenario described in the 
previous section, focusing on the important features of P2P file dissemination. 

Underlying the file dissemination system is the Internet. Thus, each user can connect to 
every other user and the network topology is a complete graph. The server S has upload 
capacity Cs and the N peers have upload capacities Ci,...,CAr, measured in megabytes 
per second (MBps). Once a user has received a file part it can participate subsequently in 
uploading it to its peers (source availability). We suppose that, in principle, any number of 
users can simultaneously connect to the server or another peer, the available upload capacity 
being shared equally amongst the open connections (fair sharing). Taking the file size to be 
1 MB, this means that if n users try simultaneously to download a part of the file (of size 
1/M) from the server then it takes n/MCg seconds for these downloads to complete. Observe 
that the rate at which an upload takes place can both increase and decrease during the time 
of that upload (varying according to the number of other uploads with which it shares the 
upload capacity), but we assume that uploads are not interrupted until complete, that is the 
rate is always positive (continuity). In fact. Lemma ^ below shows that the makespan is not 
increased if we restrict the server and all peers to carry out only a single upload at a time. 
We permit a user to download more than one file part simultaneously, but these must be 
from different sources; only one file part may be transferred from one user to another at the 
same time. We ignore more complicated interactions and suppose that the upload capacities, 
C5, Ci, . . . , Cat, impose the only constraints on the rates at which file parts can be transferred 
between peers which is a reasonable assumption if the underlying network is not overloaded. 
Finally, we assume that rates of uploads and downloads do not constrain one another. 

Note that we have assumed the download rates to be unconstrained and this might be 
considered unrealistic. However, we shall show a posteriori in Section |21 that if the upload 
capacities are equal then additional download capacity constraints do not increase the min- 
imum possible makespan, as long as these download capacities are at least as big. Indeed, 
this is usually the case in practice. 

Typically, N is the order of several thousands and the file size is up to a few gigabytes 
(GB), so that there are several thousand file parts of size 1/4 MB each. 
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Finding the minimal makespan looks potentially very hard as upload times are interde- 
pendent and might start at arbitrary points in time. However, the following two observations 
help simplify it dramatically. As we see in the next section, they also relate the uplink-sharing 
model to the simultaneous send/receive broadcasting model. 

Lemma 1 In the uplink- sharing model the minimal makespan is not increased by restricting 
attention to schedules in which the server and each of the peers only carry out a single upload 
at a time. 

Proof. Identify the server as peer and, for each i = 0, 1, . . . , consider the schedule of peer 
i. We shall use the term job to mean the uploading of a particular file part to a particular 
peer. Consider the set of jobs, say J, whose processing involves some sharing of the upload 
capacity Cj. Pick any job, say j, in J which is last in J to finish and call the time at which 
it finishes tj. Now fair sharing and continuity imply that job j is amongst the last to start 
amongst all the jobs finishing before or at time tf. To see this, note that if some job k were 
to start later than j, then (by fair sharing and continuity) k must receive less processing than 
job j by time tf and so cannot have finished by time tf. Let ts denote the starting time of 
job j. 

We now modify the schedule between time ts and tf as follows. Let K be the set of 
jobs with which job j's processing has involved some sharing of the upload capacity. Let 
us re-schedule job j so that it is processed on its own between times tf — l/CjM and tf. 
This consumes some amount of upload capacity that had been devoted to jobs in K between 
tf — 1/CiM and tf. However, it releases an exactly equal amount of upload capacity between 
times ts and tf — 1/CjM which had been used by job j. This can now be allocated (using 
fair sharing) to processing jobs in K. 

The result is that j can be removed from the set J. All jobs finish no later than they did 
under the original schedule. Moreover, job j starts later than it did under the original schedule 
and the scheduling before time ts and after time tf is not affected. Thus, all jobs start no 
earlier than they did under the original schedule. This ensures that the source availability 
constraints are satisfied and that we can consider the upload schedules independently. We 
repeatedly apply this argument until set J is empty. ■ 

Using Lemma 1, a similar argument shows the following result. 

Lemma 2 In the uplink- sharing model the minimal makespan is not increased by restricting 
attention to schedules in which uploads start only at times that other uploads finish or at time 
0. 

Proof. By the previous Lemma it suffices to consider schedules in which the server and each 
of the peers only carry out a single upload at a time. Consider the joint schedule of all peers 
i = 0,1, . . . , N and let J be the set of jobs that start at a time other than at which no 
other upload finishes. Pick a job, say j, that is amongst the first in J to start, say at time ts- 
Consider the greatest time tf such that tf < ts and i/ is either or the time that some other 
upload finishes and modify the schedule so that job j already starts at time tf. 

The source availability constraints are still satisfied and all uploads finish no later than 
they did under the original schedule. Job j can be removed from the set J and the number 
of jobs in J that start at time ts is decreased by 1, although there might now be more (but at 
most N in total) jobs in J that start at the time that job j finished in the original schedule. 
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But this time is later than tg. Thus, we repeatedly apply this argument until the number 
of jobs in J that start at time tg becomes and then move along to jobs in J that are now 
amongst the first in j to start at time t'^ > tg. Note that once a job has been removed from 
J, it will never be included again. Thus we continue until the set J is empty. ■ 

3 Centralized Solution for Equal Capacities 

In this section, we give the optimal centralized solution of the uplink-sharing model of the 
previous section with equal upload capacities. We first consider the simultaneous send/receive 
broadcasting model in which the server and all users have upload capacity of 1. The follow- 
ing theorem provides a formula for the minimal makespan and a centralized algorithm that 
achieves it is contained in the proof. 

This agrees with a result of Bar-Noy, Kipnis and Schieber [2|, who obtained it as a by- 
product of their result on the bidirectional telephone model. However, they required pairwise 
matchings in order to apply the results from the telephone model. So, for the simultaneous 
send/receive model, too, they use perfect matching in each round for odd A^, and perfect 
matching on — 2 nodes for even N . As a result, their algorithm differs for odd and even 

and it is substantially more complicated, to describe, implement and prove to be correct, 
than the one we present within the proof of Theorem ^ Theorem ^ has been obtained also 
by Kwon and Chwa [2j, via an algorithm for broadcasting in hypercubes. By contrast, our 
explicitly constructive proof makes the structure of the algorithm very easy to see. Moreover, 
it makes the proof of Theorem |31 that is, the result for the uplink-sharing model, a trivial 
consequence (using Lemmata ^ and EJ- 

Essentially, the log2 A^-scaling is due to the P2P approach. This compares favourably to 
the linear scaling of N that we would obtain for a fixed set of servers. The factor of 1/M is 
due to splitting the file into parts. 

Theorem 1 In the simultaneous send/receive model with all upload and download capacities 
equal to 1, the minimum number of rounds is M + [log2 A^J, each round taking up 1/M units 
of time. Equivalently, for all M , N , the minimal makespan is 

T* = l + ii^^. (1) 



Proof. Suppose that A^ = 2" - 1 -Fx, for x = 1, . . . , 2". So n = [logg A^J . The fact that M + n 
is a lower bound on the number of rounds is straightforwardly seen as follows. There are M 
different file parts and the server can only upload one file part (or one linear combination 
of file parts) in each round. Thus, it takes at least M rounds until the server has made 
sufficiently many uploads of file parts (or linear combinations of file parts) that the whole 
file can be recovered. The last of these M uploads by the server contains information that 
is essential to recovering the file, but this information is now known to only the server and 
one peer. It must takes at least n further rounds to disseminate this information to the other 
N — 1 peers. 

Now we show how the bound can be achieved. The result is trivial for M = 1. It is 
instructive to consider the case M = 2 explicitly. If n = then A^ = 1 and the result is 
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trivial. If n = 1 then is 2 or 3. Suppose = 3. In the following diagram each line 
corresponds to a round; each column to a peer. The entries denote the file part that the peer 
downloads that round. The bold entries indicate downloads from the server; un-bold entries 
indicate downloads from a peer who has the corresponding part. 

1 

2 1 
2 12 

Thus, dissemination of the two file parts to the 3 users can be completed in 3 rounds. The 
case = 2 is even easier. 

If n > 2, then in rounds 2 to n each user uploads his part to a peer who has no file part 
and the server uploads part 2 to a peer who has no file part. We reach a point, shown below, 
at which a set of 2"""^ peers have file part 1, a set of 2"~^ — 1 peers have file part 2, and a 
set of X peers have no file part (those denoted by * • • • *). Let us call these three sets Ai, A2 
and Aq, respectively. 

1 

2 1 

2 12 1 

2 12 12 12 1 

2 1 •■■ 2 1 

In round n + 1 we let peers in Ai upload part 1 to 2"~^ — \x/2\ peers in A2 and to \_x/2\ 
peers in Aq (If x = 1, to 2""^ — 1 peers in A2 and to 1 peer in ^o)- Peers in A2 upload part 
2 to 2""-*^ — \x/2\ peers in Ai and to another \x/2\ — 1 peers in ^o- The server uploads part 
2 to a member of Aq (If a; = 1, to a member of ^1). Thus, at the end of this round 2" — x 
peers have both file parts, x peers have only file part 1, and x — 1 peers have only file part 2. 
One more round (round n + 2) is clearly sufficient to complete the dissemination. 

Now consider M > 3. The server uploads part 1 to one peer in round 1. In rounds 
J = 2, . . . , min{n, M — 1}, each peer who has a file part uploads his part to another peer who 
has no file part and the server uploads part j to a peer who has no file part. If M < n, then 
in rounds M to n each peer uploads his part to a peer who has no file part and the server 
uploads part M to a peer who has no file part. As above, we illustrate this with a diagram. 
Here we show the first n rounds in the case M < n. 

1 

2 1 

3 12 1 

4 12 13 12 1 

M 1 ••• 2 1 

Ml ■■■2 I*---* 

When round n ends, 2" — 1 peers have one file part and x peers have no file part. The number 
of peers having file part i is given in the second column of Table ^ In this table any entry 
which evaluates to less than 1 is to be read as (so, for example, the bottom two entries in 
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Part Numbers of the file parts at the ends of rounds 





n TL -\- \ n -\- 2 


n + 3 • • • n + Af — 1 


1 


2n-l 2" N 


iV ■■■ N 


2 




N ■■■ N 


3 


2n — 3 2^~2 Q^n—l 


2" • • • 


4 


2^1—4 2^n—3 2^n—2 


2«-i ... AT 


M-2 


2n~M+2 2"-*^+3 2"^*^+4 


2n-Af+5 ... ^ 


M - 1 


2ri-Af+l 2"^-'*-f+2 2"^*^+3 


2n — A/+4 . . . 2^ 


M 


2ri-ji/+i _ 2 2"^-'^-f+2 — 1 2"^-'*-'^+''' 


- 1 2""^^+'^ - 1 • • • 2" - 1 


Table 1: Number of file part repUca as 


obtained with our algorithm. 


set 


peers in the set have 


number of peers in set 




parts 1 and 2 


2"-^ - [x/2\ 


Bip 


part 1 and a part other than 1 or 2 2" ^ — \x/2] 




just part 1 


X 


B2 


just part 2 


[x/2\ 


Bp 


just a part other than 1 or 2 


\x/2] - 1 


Table 2: File parts held by various sets of 


peers at the end of round n + 1 . 



column 2 and the bottom entry in column 3 are for n = M — 2). Now in round n + 1, 
by downloading from every peer who has a file part, and downloading part min{n + 1,M} 
from the server, we can obtain the numbers shown in the third column. Moreover, we can 
easily arrange so that peers can be divided into the sets S12, -Bip, B2 and Bp as shown 
in Table 121 In round n + 2, x — 1 of the peers in Bi upload part 1 to peers in B2 and Bp. 
Peers in B12 and B2 each upload part 2 to the peers in Bip and to \x/2~\ of the peers in Bi. 
The server and the peers in Bip and Bp each upload a part other than 1 or 2 to the peers 
in B12 and to the other \_x/2\ peers in Bi. The server uploads part min{n + 2,M} and so 
we obtain the numbers in the fourth column of Table ^ Now all peers have part 1 and so 
it can be disregarded subsequently. Moreover, we can make the downloads from the server, 
Bip and Bp so that (disregarding part 1) the number of peers who ultimately have only part 
3 is [2;/2j. This is possible because the size of Bp is no more than [a;/2j; so if j peers in 
Bp have part 3 then we can upload part 3 to exactly [x/2j — j peers in Bi. Thus, a similar 
partitioning into sets as in Table 121 will hold as we start step n + 3 (when parts 2 and 3 takes 
over the roles of parts 1 and 2 respectively). 

We continue similarly in subsequent rounds, until at the end of round n + M — 1, all peers 
have parts 1, . . . , M — 2, 2" — x peers also have both part M — 1 and part M, x peers also 
have only part M — 1, and x — 1 peers also have only part M. It now takes just a final round 
to ensure that all peers have parts M — 1 and M. I 

Note that the optimal strategy above follows two principles. As many different peers 
as possible obtain file parts early on so that they can start uploading themselves and the 
maximal possible upload capacity is used. Moreover, there is a certain balance in the upload 
of different file parts so that no part gets circulated too late. 
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It is interesting that not all the available upload capacity is used. Suppose M > 2. 
Observe that in round k, for each k = n + 2, . . . ,n + A4 — 1, only x — 1 of the x peers (in 
set Bi) who have only file part k — n — 1 make an upload. This happens M — 2 times. Also, 
in round n-\- M there are only 2x — 1 uploads, whereas iV + 1 are possible. Overall, we use 
N + M — 2x less uploads than we might. It can be checked that this number is the same for 
M = 1. 

Suppose we were to follow a schedule that uses only x uploads during round n + 1, when 
the last peer gets its first file part. We would be using 2" — x less uploads than we might 
in this round. Since 2" — x < N + M — 2x, we see that the schedule used in the proof 
above wastes at least as many uploads. So the mathematically interesting question arises 
as to whether or not it is necessary to use more than x uploads in round n + 1. In fact, 
{N + M — 2x) — (2" — x) = M — 1, so, in terms of the total number of uploads, such a 
scheduling could still afford not to use one upload during each of the last M — 1 rounds. The 
question is whether or not each file part can be made available sufficiently often. 

The following example shows that if we are not to use more than x uploads in round n+1 
we will have to do something quite subtle. We cannot simply pick any x out of the 2" uploads 
possible and still hope that an optimal schedule will be shiftable: by which we mean that the 
number of copies of part j at the end of round k will be the same as the number of copies of 
part j — 1 at the end of round k — 1. It is the fact that the optimal schedule used in Theorem^ 
is shiftable that makes its optimality so easy to see. 

Example 1 Suppose M = 4 and N = 13 = 2^ + 6 - 1, so M + [log2 N\ = 7. // we follow 
the same schedule as in Theorem^ we reach after round 3, 

"'■2I312I 

Now if we only make x = 6 uploads during round 4, then there are eight ways to choose 
which six parts to upload and which two parts not to upload. One can check that in no case 
is it possible to arrange so that once this is done and uploads are made for round 5 then the 
resulting state has the same numbers of parts 2, 3 and 4, respectively, as the numbers of parts 
1, 2 and 3 at the end of round 4. That is, there is no shiftable optimal schedule. In fact, if 
our six uploads has been four part Is and two part 2s, then it would not even be possible to 
achieve 

In some cases, we can achieve if we relax the demand that the schedule be shiftable. 
Indeed, we conjecture that this is always possible for at least one schedule that uses only 
x uploads during round n + 1. However, the fact that we cannot use essentially the same 
strategy in each round makes the general description of a non-shiftable optimal schedule 
very complicated. Our aim has been to find an optimal (shiftable) schedule that is easy to 
describe. We have shown that this is possible if we do use the spare capacity at round n + 1. 
For practical purposes this is desirable anyway, since even if it does not affect the makespan 
it is better if users obtain file parts earlier. 

When X = 2" our schedule can be realized using matchings between the 2" peers holding 
the part that is to be completed next and the server together with the 2" — 1 peers holding 
the remaining parts. But otherwise this is not always possible to schedule only with match- 
ings. This is why our solution would not work for the more constrained telephone-like model 
considered in [2] (where, in fact, the answer differs as A'^ is even or odd), to describe. 
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The solution of the simultaneous send/receive broadcasting model problem now gives the 
solution of our original uplink-sharing model when all capacities are the same. 

Theorem 2 Consider the uplink- sharing model with all upload capacities equal to 1. The 
minimal makespan is given by (^Q), for all M , N , the same as in the simultaneous send/receive 
model with all upload capacities equal to 1. 

Proof. Note that under the assumptions of the theorem and with application of Lemmas ^ 
and[21 the optimal solution to the uplink-sharing model is the same as that of the simultaneous 
send/receive broadcast model when all upload capacities equal to 1. I 

In the proof of Theorem ^ we explicitly gave an optimal schedule which also satisfies the 
constraints that no peer downloads more than a single file part at a time. Thus, we also have 
the following result. 

Theorem 3 In the uplink- sharing model with all upload capacities equal to 1, constraining 
the peers ' download rates to 1 does not further increase the minimal makespan. 

4 Centralized Solution for General Capacities 

We now consider the optimal centralized solution in the general case of the uplink-sharing 
model in which the upload capacities may be different. Essentially, we have an unusual type 
of precedence-constrained job scheduling problem. In Section [4. II we formulate it as a mixed 
integer linear program (MILP). The MILP can also be used to find approximate solutions 
of bounded size of sub-optimality. In practice, it is suitable for a small number of file parts 
M. We discuss its implementation in Section 14.21 Finally, we provide additional insight 
into the solution with different capacities by considering special choices for N and M when 
Ci = C2 = • • • = Cat, but might be different f Sections Ol and IO|) . 

4.1 MILP formulation 

In order to give the MILP formulation, we need the following Lemma. Essentially, it shows 
that time can be discretized suitably. 

Lemma 3 Consider the uplink- sharing model and suppose all uplink capacities are integer 
multiples of a common time unit. Then there exists t, such that under an optimal schedule 
all uploads start and finish at integer multiples of t. 

Proof. Rescale time so that C5, Ci, . . . , Cat are all integers and let L be their least common 
multiple. The time that the first job completes must be an integer multiple of 1/ML. All 
remaining jobs are of sizes 1/Af or 1/Af — (l/MCj)Ci for various Ci < Cj. These are 
also integer multiples of 1/ML. Repeating this, we find that the time that the second job 
completes, and the lengths of all remaining jobs at this point must be integer multiples of 
1/(ML)2. Repeating further, we find that r = 1/(ML)^^^ suffices. ■ 

We next show how the solution to the general problem can be found by solving a number 
of linear programs. Let time interval t be the interval [tr, tr -\- t), t = 0, . . . . Identify the 
server as peer 0. Let Xijk{t) be 1 or as peer i downloads file part k from peer j during 
interval t or not. Let Pik{t) denote the proportion of file part k that peer i has downloaded 
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by time t. Our problem is then is to find the minimal T such that the optimal value of the 
following MILP is MN. Since this T is certainly greater than I/C5 and less than N/Cs, we 
can search for its value by a simple bisection search, solving this LP for various T: 

maximize ^^^^^(T) (2) 

i.k 

subject to the constraints given below. The source availability constraint © guarantees that 
a user has completely downloaded a part before he can upload it to his peers. The connection 
constraint requires that each user only carries out a single upload at a time. This is 
justified by Lemma ^ which also saves us another essential constraint and variable to control 
the actual download rates: The single user downloading from peer j at time t will do so at 
rate Cj as expressed in the link constraint © . Continuity and stopping constraints ((SI Ej) 
require that a download that has started will not be interrupted until completion and then be 
stopped. The exclusivity constraint (jlOp ensures that each user downloads a given file part 
only from one peer, not from several ones. Stopping and exclusivity constraints are not based 
on assumptions, but obvious constraints to exclude redundant uploads. 

Regional constraints 

Xijkit) G {0, 1} for ah k,t (3) 
Pife(t) E [0,1] for alH,/c,t (4) 

Link constraints between variables 

t~T N 

Pik{t) = Mt ^ ^ Xijk{t')Cj for all i, k, t (5) 
t'=o j=o 

Essential constraints 

Xijkit) — (,jk{t) < for all i,j,k,t (Source availability constraint) (6) 
''^^Xijkit) < 1 for all j,t (Connection constraint) (7) 

i,k 

Xijk{t) — Cikit + 1) — Xijkit + 1) < for all z, j, fc, t (Continuity constraint) (8) 
Xijkit) + Cikit) < 1 for all i,j,k,t (Stopping constraint) (9) 
Xjjkit) < 1 for all i,k,t (Exclusivity constraint) (10) 

i 

Initial conditions 

pofc(O) = 1 for all k (11) 
Pifc(O) = for ah i,/c (12) 

Constraints ©-((El) have been linearized. Background can be found in [H^. For this, we used 
the auxiliary variable ^ikit) = l{pjfc(t) = 1}. This definition can be expressed through the 
following linear constraints. 

Linearization constraints 

i^kit) G {0,1} for aU i,k,t (13) 
Pikit) - iikit) > and Pikit) - Cikit) < 1 for all i, k, t (14) 
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It can be checked that together with (jSJ-©, indeed, this gives 

Xijk{t) = 1 and pik{t + 1) < 1 =^ Xijkit + 1) = 1 for ah i,j,k,t (15) 
Pikit) = 1 Xijkit) = for all k,t (16) 
Pjk{t) < 1 Xijkit) = for all i,j,k,t (17) 

that is, continuity, stopping and source availability constraints respectively. 

4.2 Implementation of the MILP 

MILPs are well-understood and there exist efficient computational methods and program 
codes. The simplex method introduced by Dantzig in 1947, in particular, has been found 
to yield an efficient algorithm in practice as well as providing insight into the theory. Since 
then, the method has been specialized to take advantage of the particular structure of certain 
classes of problems and various interior point methods have been introduced. For integer 
programming there are branch-and-bound, cutting plane (branch-and-cut) and column gen- 
eration (branch-and-price) methods as well as dynamic programming algorithms. Moreover, 
there are various approximation algorithms and heuristics. These methods have been im- 
plemented in many commercial optimization libraries such as OSL or CPLEX. For further 
reading on these issues the reader is referred to [2H] , |3 and [HHl • 

Thus, implementing and solving the MILPs gives the minimal makespan solution. Al- 
though, as the numbers of variables and constraints in the LP grows exponentially in N and 
M, this approach is not practical for large N and M. 

Even so, we can use the LP formulation to obtain a bounded approximation to the solution. 
If we look at the problem with a greater r, then the job end and start times are not guaranteed 
to lie at integer multiples of r. However, if we imagine that each job does take until the end of 
an r- length interval to finish (rather than finishing before the end), then we will overestimate 
the time that each job takes by at most r. Since there are NM jobs in total, we overestimate 
the total time taken by at most NMt. Thus, the approximation gives us an upper bound on 
the time taken and is at most NMt greater than the true optimum. So we obtain both upper 
and lower bounds on the minimal makespan. Even for this approximation, the computing 
required is formidable for large N and M. 

4.3 Insight for special cases with small and M 

We now provide some insight into the minimal makespan solution with different capacities 
by considering special choices for and M when Ci = C2 = • • • = Cn, but Cs might be 
different. This addresses the case of the server having a significantly higher upload capacity 
than the end users. 

Suppose N = 2 and M = 1, that is, the file has not been split. Only the server has 
the ffie initially, thus either (a) both peers download from the server, in which case the 
makespan is T = 2/Cs, or (b) one peer downloads from the server and then the second peer 
downloads from the first; in this case T = I/C5 -|- 1/Ci. Thus, the minimal makespan is 
T* = l/Cs + mm{l/Cs,l/C,}. 

li N = M = 2 we can again adopt a brute force approach. There are 16 possible cases, 
each specifying the download source that each peer uses for each part. These can be reduced 
to four by symmetry. 
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Case A: Everything is downloaded from the server. This is effectively the same as case (a) 
above. When Ci is small compared to C5, this is the optimal strategy. 

Case B: One peer downloads everything from the server. The second peer downloads from 
the first. This is as case (b) above, but since the file is split in two, T is less. 
Case C: One peer downloads from the server. The other peer downloads one part of the file 
from the server and the other part from the first peer. 

Case D: Each peer downloads exactly one part from the server and the other part from the 
other peer. When Ci is large compared to C5, this is the optimal strategy. 

In each case, we can find the optimal scheduling and hence the minimal makespan. This 
is shown in Table |31 

case makespan 

A ^ 

1 1 f I 1 \ 

C ^ + max(^,2^) 
D ^ + ^ 

Table 3: Minimal makespan in the four possible cases when N ^ M = 2. 

The optimal strategy arises from A, C or D as Ci/Cs lies in the intervals [0, 1/3], [1/3, 1] 
or [1,00) respectively. In [1, 00), B and D yield the same. See Figure ^ Note that under the 
optimal schedule for case C one peer has to wait while the other starts downloading. This 
illustrates that greedy-type distributed algorithms may not be optimal and that restricting 
uploaders to a single upload is sometimes necessary for an optimal scheduling (cf . Section ^ . 

6 
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Figure 1: Minimal makespan as a function of C1/C5 in tlic four possible cases when N ~ M = 2. 
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4.4 Insight for special cases with large M 

We still assume Ci = C2 = • • • = Cat, but Cs might be different. In the limiting case that the 
file can be divided into infinitely many parts, the problem can be easily solved for any number 
of users. Let each user download a fraction 1 — a directly from the server at rate Cs/N and 
a fraction a/ {N — 1) from each of the other A^ — 1 peers, at rate min{C5/iV, Ci/{N — 1)} from 
each. The makespan is minimized by choosing a such that the times for these two downloads 
are equal, if possible. Equating them, we find the minimal makespan as follows. 



Case 1: Ci/(iV - 1) < Cs/N: 



{l-a)N 
Cs 



a 
C~i 



a 



NCi 



Cs + NCi 



T 



N 



Case 2: Ci/{N - 1) > Cs/N: 



[1 - a)N 
Cs 



aN 



{N - l)Cs 



a 



N - 1 
N 



Cs + NCi 



T = — 

Cs 



(18) 



(19) 



In total, there are N MB to upload and the total available upload capacity is Cs + NCi 
MBps. Thus, a lower bound on the makespan is N/{Cs + NCi) seconds. Moreover, the 
server has to upload his file to at least one user. Hence another lower bound on the makespan 
is I/C5. The former bound dominates in case 1 and we have shown that it can be achieved. 
The latter bound dominates in case 2 and we have shown that it can be achieved. As a result, 
the minimal makespan is 

Figure shows the minimal makespan when the file is split in 1, 2 and infinitely many file 
parts when N = 2. It illustrates how the makespan decreases with M. 



Time 



M=l 
M=2 
M=3 




CI 

0.5 1 1.5 2 2.5 

Figure 2: Minimal makespan as a function of C1/C5 for different values of M when N = 2. 



In the next section, we extend the results in this limiting case to a much more general 
scenario. 
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5 Centralized Fluid Limit Solution 



In this section, we generalize the results of Section 14.41 to allow for general capacities Cj. 
Moreover, instead of limiting the number of sources to one designated server with a file to 
disseminate, we now allow every user i to have a file that is to be disseminated to all other 
users. We provide the centralized solution in the limiting case that the file can be divided 
into infinitely many parts. 

Let Fi > denote the size of the file that user i disseminates to all other users. Seeing 
that in this situation there is no longer one particular server and everything is symmetric, 
we change notation for the rest of this section so that there are N > 2 users 1,2, . . . , A''. 
Moreover, let F = ^^L^ Fi and C = X^^L^ Q. We wih prove the following result. 

Theorem 4 In the fluid limit, the minimal makespan is 

^* jFi F2 Fm {N-l)F \ 

and this can he achieved with a two-hop strategy, i.e., one in which users i's file is uploaded 
to user j, either directly from user i, or via at most one intermediate user. 

Proof. The result is obvious for N = 2. Then the minimal makespan is max{Fi/Ci, F2/C2} 
and this is exactly the value of T* in (|2H) . 



So we consider > 3. It is easy to see that each of the + 1 terms within the braces 
on the right hand side of l\21\\ are lower bounds on the makespan. Each user has to upload 
his file at least to one user, which takes time Fi/Ci. Moreover, the total volume of files to be 
uploaded is {N — 1)F and the total available capacity is C . Thus, the makespan is at least 
T*, and it remains to be shown that a makespan of T* can be achieved. There are two cases 
to consider. 

Case 1: {N - l)F/C > maxj Fi/d for all i. 

In this case, T* = (N — 1)F/C. Let us consider the 2-hop strategy in which each user uploads 
a fraction an of its file Fi directly to all (A'^ — 1) peers, simultaneously and at equal rates. 
Moreover, he uploads a fraction Oij to peer j who in turn then uploads it to the remaining 
(A^ — 2) peers, again simultaneously and at equal rates. Note that X^^Li (^ij — 1- 

Explicitly constructing a suitable set Oij, we thus obtain the problem 



minT (22) 



subject to, for all i, 
1 



a,iF,{N -l) + Yl OikF, + akiFk{N - 2) 



<T. (23) 



We minimize T by choosing the aij in such a way as to equate the A^ left hand sides of the 
constraints, if possible. Rewriting the expression in square brackets, equating the constraints 
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for i and j and then summing over all j we obtain 



C 



auFi{N -2)+Fi + Y, - 2) 



(iV - 2) ^ a,,Fj + F + iN-2)iF-Y^ a,,F, 



(N - l)CiF 



Thus, 



a, 



iFi{N -2)+Fi + ^Qfc,Ffc(iV -2) = {N- 1)^F 



(24) 



(25) 



Note that there is a lot of freedom in the choice of the a so let us specify that we require a^i 
to be constant in k for k ^ i, that is a^j = a* for k ^ i. This means that i has the capacity 
to take over a certain part of the dissemination from some peer, then it can and will also take 
over the same proportion from any other peer. Put another way, user i splits excess capacity 
equally between its peers. Thus, 



auFiiN -2) + Fi + a*{N - 2){F - Fi) = {N - 1)^F 



(26) 



Still, we have twice as many variables as constraints. Let us also specify that a* = an for all 
i. Similarly as above, this says that the proportion of its own file Fi that i uploads to all its 
peers (rather than just to one of them) is the same as the proportion of the files that it takes 
over from its peers. Then 



a.: 



{N -l){Ci/C)F-F, {N-l)C, 



Fi 



(27) 



{N-2)F {N-2)C (iV-2)F' 

where = 1 and a* > 0, because in case 1 Fj/Cj < (iV - \)F/C. 

With these ajj, we obtain the time for i to complete its upload and hence the time for 
everyone to complete their upload as 



T = — 



1 

Ci 



a*F,{N - 2) + + ^ a*iFkiN - 2) 

kj^i 



{N-l)Fi F^ ^F\^{N-1){F-Fi) Fi{F - Fi] 



c CiF a 

{N - l)F/C. 



C 



CiF 



(28) 



Note that there is no problem with precedence constraints. All uploads happen simultaneously 
stretched out from time to T. User i uploads to j a fraction aij of Fi. Thus, he does so at 
constant rate aijFi/Ti = OijFi/T. User j passes on the same amount of data to each of the 
other users in the same time, hence at the same rate OijFi/Tj = OijFi/T. 

Thus, we have shown that if the aggregate lower bound dominates the others, it can be 
achieved. It remains to be shown that if an individual lower bound dominates, than this can 
be achieved also. 
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Case 2: Fi/d > {N - l)F/C for some i. 

By contradiction it is easily seen that this cannot be the case for all i. Let us order the users 
in decreasing order of Fi/Ci, so that Fi/Ci is the largest of the Fi/Ci. We wish to show that 
all files can be disseminated within a time of Fi/Ci. To do this we construct new capacities 
Cj' with the following properties: 

C[ = Ci, (29) 
C'i < Ci for f / 1, (30) 
{N - 1)F/C' = Fi/C[ = Fi/Ci and (31) 
F^/Ci < Fi/Ci. (32) 

This new problem satisfies the condition of Case 1 and so the minimal makespan is T' = 
Fi/C\. Hence the minimal makespan in the original problem \s T = Fi/Ci also, because the 
unprimed capacities are greater or equal to the primed capacities by property (|3fl|) . 

To exphcitly construct capacities satisfying let us define 

C,' = (iV-l)^7.i^, (33) 

with constants 7i > such that 

i 

Then (iV - l)F/C' = Fi/Ci, that is (jSU holds. Moreover, choosing 

I CiFi 

ensures C- < Cj, i.e. property (|5n|) and choosing 

7. > ^ (36) 

ensures Fi/C'i < Fi/Ci, that is property (|32() . Furthermore, the previous two conditions 
together ensure that 71 = 1/{N — 1) and thus C[ = Ci, that is property (|29j) . It remains to 
construct a set of parameters 7^ that satisfies pi)) , and . 

Putting all 7i equal to the lower bound (|36|) gives Yli^iFi = — 1); that is too small to 

satisfy pi]) . Putting all equal to the upper bound 1)35(1 gives 7iFj = FiC/{N — l)Ci, that 
is too large to satisfy (|34|) . So we pick a suitably weighted average instead. Namely, 



1 



' iV- 1 

such that 



(37) 



N -ICi ^ ' N -1 

that is 



F F 
'+(1--^)^^ = ^ (38) 
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Substituting back in we obtain 



li = 



1 {N - 2)FFiCi + FjFiC - {N - l)FFiCi 
N - 1 (FiC - FCi)Fi 



(40) 



and thus 



Ci {N - 2)FFiCi + FjFiC - {N - l)FFiCi 
Fi FiC-FCi 



(41) 



By construction, these C'^ satisfy properties and hence, by the results in Case 

1, T' = Fi/Ci. Hence the minimal makespan in the original problem T = Fi/Ci also. I 

It is worth noting that there is a lot of freedom in the choice of the ajj. We have chosen 
a symmetric approach, but other choices are possible. 

In practice, the file will not be infinitely divisible. However, we often have M >> log(A^) 
and this appears to be sufficient for (|21() to be a good approximation. Thus, the fluid limit 
approach of this section is suitable for typical and for large values of M. 

6 Decentralized Solution for Equal Capacities 

In order to give a lower bound on the minimal makespan, we have been assuming a centralized 
controller does the scheduling. We now consider a naive randomized strategy and investigate 
the loss in performance that is due to the lack of centralized control. We do this for equal 
capacities and in two different information scenarios, evaluating its performance by analytic 
bounds, simulation as well as direct computation. In Section 123 we consider the special case 
of one file part, in Section 16.21 we consider the general case of M file parts. We find that 
even this naive strategy disseminates the file in an expected time whose growth rate with 
N is similar to the growth rate of the minimal time that we have found for a centralized 
controller (cf. Section ^ . This suggests that the performance of necessarily decentralized 
P2P file dissemination systems should still be close to our performance bounds so that they 
are useful in practice. 

6.1 The special case of one file part 
Assumptions 

Let us start with the case M = 1. We must first specify what information is available to 
users. It makes sense to assume that each peer knows the number of parts into which the file 
is divided, M, and the address of the server. However, a peer might not know N, the total 
number of peers, nor its peers' addresses, nor if they have the file, nor whether they are at 
present occupied uploading to someone else. 

We consider two different information scenarios. In the ffi'st one. List, the number of 
peers holding the file and their addresses are known. In the second one, NoList, the number 
and addresses of all peers are known, but not which of them currently hold the file. Thus, in 
List, downloading users choose uniformly at random between the server and the peers already 
having the file. In NoList, downloading users choose uniformly amongst the server and all 
their peers. If a peer receives a query from a single peer, he uploads the file to that peer. 
If a peer receives queries from multiple peers, he chooses one of them uniformly at random. 
The others remain unsuccessful in that round. Thus, in List transmission can fail only if too 
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many users try to download simultaneously from the same uploader. In NoList, transmission 
might also fail if a user tries to download from a peer who does not yet have the file. 



Theoretical Bounds 

The following theorem explains how the expected makespan that is achieved by the random- 
ized strategy grows with A^, in both the List and the NoList scenarios. 

Theorem 5 In the uplink- sharing model, with equal upload capacities, the expected number 
of rounds required to disseminate a single file to all peers in either the List or NoList scenario 
is e(logA^). 

Proof. In the List scenario our simple randomized algorithm runs in less time than in the 
NoList scenario. Since already have the lower bound given by Theorem it suffices to prove 
that the expected runing time in the NoList scenario is 0(log N). There is also similar direct 
proof that the expected running time under the List scenario is 0(log A^). 

Suppose we have reached a stage in the dissemination at which ni peers (including the 
server) have the file and no peers do not, with hq+ui = A^+l. (The base case is ni = 1, when 
only the server has the file.) Each of the peers that does not have the file randomly chooses 
amongst the server and all his peers (NoList) and tries to download the file. If more than 
one peer tries to download from the same place then only one of the downloads is successful. 
The proof has two steps. 

(i) Suppose that ni < hq. Let i be the server or a peer who has the file and let li be an 
indicator random variable that is or 1 as i does or does not upload it. Let Y = li, where 
the sum is taken over all ni peers who have the file. Thus ni — y is the number of uploads 
that take place. Then 

Now since E{Y2i^i) = '^^ ^^.ve EY < n\j Thus, by the Markov inequality, that 

for a nonnegative random variable Y we have that for any k (not necessarily an integer) 
P(y > A;) < (\lk)EY, we have by taking k = (2/3)ni, 

P(ni - y = number of uploads < \ni) = P{Y > |ni) < ^^/^ = 3/(2^6) < 1 . (43) 

Thus the expected number of steps required for the number of peers who have the file to 
increases from ni to at least ni + (l/3)ni = (4/3)ni is bounded by a geometric random 
variable with mean /x = 1/(1 — 3/(2y^)). This implies that we will reach a state in which 
more peers have the file than do not in an expected time that is O(logA^). From that point 
we continue with step (ii) of the proof. 

(ii) Suppose ni > uq. Let j be a peer who does not have the file and let Jj be an indicator 
random variable that is or 1 as peer j does or does not succeed in downloading it. Let 
Z = Jj, where the sum is taken over all uq peers who do not have the file. Suppose X is 
the number of the other no — 1 peers that try to download from the same place as does peer 



19 



j. Then 



P{J, =0) = E 
> E 

- !}1 

~ N 

- m 

~ N 



1 



ni 

N \1 + X 

ni 

N 



[1-X) 

m - 



N 
N -ni 



N 



n 



> 1/4. 

Hence EZ < (3/4)no and so, again using the Markov inequahty, 



(44) 



P(no — Z = number of downloads < |rao) = P{Z > |n.o) < j— = f • (45) 

gno 

It follows that the number of peers who do not yet have the file decreases from uq to no more 
than (7/8)no in an expected number of steps no more than = 1/(1 - I) = 7. Thus the 
number of steps needed for the number of peers without the file to decrease from uq to is 
O(log?io) = 0(log A^). In fact, this is a weak upper bound. By more complicated arguments 
we can show that if no = aN, where a < 1/2, then the expected remaining time for our 
algorithm to complete under NoList is 0(loglogA^). For a > 1/2 the expected time remains 
e (log TV). ■ 



Simulation 

For the problem with one server and N users we have carried out 1000 independent simulation 
runs^ for a large range of parameters, = 2, 4, . . . , 2^^. We found that the achieved expected 
makespan appears to grow as a + 6 x log2 N. Motivated by this and the theoretical bound 
from Theorem 121 we fitted the linear model 

Vij = a + I3xi + eij , (46) 

where yij is the makespan for Xi = log2 2*, obtained in run j , j = 1, . . . , 1000. Indeed, the 
model fits the data very well in both scenarios. We obtain the following results that enable 
us to compare the expected makespan of the naive randomized strategy to the that of a 
centralized controller. 

For List, the regression analysis gives a good fit, with Multiple R-squared value of 0.9975 
and significant p- and t-values. The makespan increases as 

1.1392 + 1.1021 X log2 N . (47) 

*As many as 1000 runs were required for the comparison with the computational resuhs in Tables |1| and |^ 
mainly because the makespan always takes integer values. 
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For NoList, there is more variation in the data than for List, but, again, the hnear regression 
gives a good fit, with Multiple R-squared of 0.9864 and significant p- and t-values. The 
makespan increases as 

1.7561 + 1.5755 x logg N . (48) 

As expected, the additional information for List leads to a significantly lesser makespan when 
compared to NoList, in particular the log-term coefficient is significantly smaller. In the List 
scenario, the randomized strategy achieves a makespan that is very close to the centralized 
optimum of 1 + [log2 -/VJ of Section |3J It is only suboptimal by about 10%. Hence even this 
simple randomized strategy performs well in both cases and very well when state information 
is available, suggesting that our bounds are useful in practice. 



Computations 

Alternatively, it is possible to compute the mean makespan analytically by considering a 
Markov Chain on the state space 0, 1, 2, . . . , A^, where state i corresponds to i of the N peers 
having the file. We can calculate the transition probabilities pij. In the NoList case, for 
example, following the Occupancy Distribution (e.g., ^Sl), we obtain 

Pii+m = (i-j)!(i-L)!(j-i + m)! ( A-/) ■ ^^^^ 

Hence we can successively compute the expected hitting times k(i) of state A starting from 
state i via 

k{i) = . (50) 

-L Pa 

The resulting formula is rather complicated, but can be evaluated exactly using arbitrary 
precision arithmetic on a computer. Computation times are long, so to keep them shorter we 
only work out the transition probabilities of the associated Markov Chain exactly. Hitting 
times are then computed in double arithmetic, that is, to 16 significant digits. Even so, com- 
putations are only feasible up to A = 512 with our equipment, despite repeatedly enhanced 
efficiency. This suggests that simulation is the more computationally efficient approach to 
our problem. The computed mean values for List and NoList are shown in Tables 0] and [5] 
respectively. The difference to the simulated values is small without any apparent trend. It 
can also be checked by computing the standard deviation that the computed mean makespan 
is contained in the approximate 95% confidence interval of the simulated mean makespan. 
The only exception is for A = 128 for NoList where it is just outside by approximately 0.0016. 

Thus, the computations prove our simulation results accurate. Since simulation results 
are also obtained more efficiently, we shall stick to simulation when investigating the general 
case of M file parts in the next section. 



6.2 The general case of M file parts 
Assumptions 

We now consider splitting the file into several file parts. With the same assumptions as in 
the previous section, we repeat the analysis for List for various values of M. Thus, in each 
round, a downloading user connects to a peer chosen uniformly at random from those peers 



21 



N sim. comp. difFcrcncc 



TV 



sim. comp. difference 



2 2.000 2.000 =0.000 

4 3.089 3.083 +0.006 

8 4.f67 4.f72 -0.005 

f6 5.333 5.3f9 +0.0f4 

32 6.534 6.538 -0.004 

64 7.806 7.794 +0.0f2 

f28 8.994 8.98f +0.0f3 

256 f0.059 f0.057 +0.002 

5f2 ff.f07 ff.ff6 -0.009 



2 

4 
8 

f6 
32 
64 
f28 
256 
5f2 



2.3f4 2.333 -0.019 

4.071 4.058 +0.013 

5.933 5.956 -0.023 

7.847 7.867 -0.020 

9.689 9.710 -0.021 

11.430 11.475 -0.045 

13.092 13.173 -0.081 

14.827 14.819 +0.008 

16.426 16.427 -0.001 



Table 4: Simulated and computed mean 
makespans for List are close. 



Table 5: Simulated and computed mean 
makespans for NoList are close. 



that have at least one file part that the user does not yet have. An uploading peer randomly 
chooses one out of the peers requesting a download from him. He uploads to that peer a file 
part that is randomly chosen from amongst those that he has and the peer still needs. 

Simulation 

Again, we consider a large range of parameter. We carried out 100 independent runs for each 
iV = 2, 4, . . . , 2^^ For each value of M = 1 - 5, 8, 10, 15, 20, 50 we fitted the linear model 

dini). 

Table El summarizes the simulation results. The Multiple R-squared values indicate a good 
fit, although the fact that these decrease with M suggests there may be a finer dependence on 
M or N. In fact, we obtain a better fit using Generalized Additive Models (cf. ^5)- However, 
our interest here is not in fitting the best possible model, but to compare the growth rate with 
N to the one obtained in the centralized case in Section 01 Moreover, from the diagnostic 
plots we note that the actual performance for large N is better than given by the regression 
line, increasingly so for increasing M. In each case, we obtain significant p- and t- values. The 
regression 0. 7856+1. 1520xlog2 for M = 1 does not quite agree with 1. 1392+1. 1021xlog2 A^ 
found in H47|) . It can be checked, by repeating the analysis there for A^ = 2, 4, . . . , 2^^ that 
this is due to the different range of A^. Thus, our earlier result of 1.1021 might be regarded 
more reliable, being based on A^ ranging up to 2^^. 

We conclude that, as in the centralized scenario, the makespan can also be reduced sig- 
nificantly in a decentralized scenario even when a simple randomized strategy is used to 
disseminate the file parts. However, as we note by comparing the second and fourth columns 
of Table (HI as M increases the achieved makespan compares less well relative to the central- 
ized minimum of 1 + (l/M)[log2 A^J- In particular, note the slower decrease of the log-term 
coefficient. This is depicted in Figure (31 

Still, we have seen that even this naive randomized strategy disseminates the file in an 
expected time whose growth rate with A^ is similar to the growth rate of the minimal time that 
we have found for a centralized controller in Section^ confirming our performance bounds are 
useful in practice. This is confirmed also by initial results of current work on the performance 
evaluation of the Bullet' system j20j . 

The program code for simulations as well as the computations and the diagnostic plots 



22 



M 




Fitted 


Multiple 


1/AI 






makespan 


R-squared 




1 


0.7856 


+ 1.1520 X logaiV 


0.9947 


1.000 


2 


1.3337 


+ 0.6342 X log2 N 


0.9847 


0.500 


3 


1.4492 


+ 0.4561 X log2 N 


0.9719 


0.333 


4 


1.4514 


+ 0.3661 X log2 N 


0.9676 


0.250 


5 


1.4812 


+ 0.3045 X log2 N 


0.9690 


0.200 


8 


1.4907 


+ 0.2113 X log2 7V 


0.9628 


0.125 


10 


1.4835 


+ 0.1791 X log^N 


0.9602 


0.100 


15 


1.4779 


+ 0.1326 X log^N 


0.9530 


0.067 


20 


1.4889 


+ 0.1062 X log^N 


0.9449 


0.050 


50 


1.4524 


+ 0.0608 X log2 N 


0.8913 


0.020 



Table 6: Simulation results in the decentralized List scenario for various values of M and log-term 
coefficients in the centralized optimum (cf. Theorem 

used in this section are available on request and will be made available via the Internet^. 

7 Discussion 

In this paper, we have given three complementary solutions for the minimal time to fully 
disseminate a file of M parts from a server to N end users in a centralized scenario, thereby 
providing a lower bound on and a performance benchmark for P2P file dissemination systems. 
Our results illustrate how the P2P approach, together with splitting the file into M parts, 
can achieve a significant reduction in makespan. Moreover, the server has a reduced workload 
when compared to the traditional client / server approach in which it does all the uploads itself. 
We also investigate the part of the loss in efficiency that is due to the lack of centralized 
control in practice. This suggests that the performance of necessarily decentralized P2P 
file dissemination systems should still be close to our performance bound confirming their 
practical use. 

It would now be very interesting to compare dissemination times of the various efficient 
real overlay networks directly to our performance bound. A mathematical analysis of the 
protocols is rarely tractable, but simulation or measurements such as in ^\ and j.SOj for the 
Bit Torrent protocol can be carried out in an environment suitable for this comparison. Cf. 
also testbed results for Slurpie [HHI and simulation results for Avalanche J2]. It is current 
work to compare our bounds to the makespan obtained by Bullet' [20]. Initial results confirm 
their practical use further. 

In practice, splitting the file and passing on extra information has an overhead cost. More- 
over, with the Transmission Control Protocol (TCP), longer connections are more efficient 
than shorter ones. TCP is used practically everywhere except for the Internet Control Mes- 
sage Protocol (ICMP) and User Datagram Protocol (UDP) for real-time applications. For 
further details see [HSI- Still, with an overhead cost it will not be optimal to increase M 
beyond a certain value. This could be investigated in more detail. 

In the proof of Lemma 1 and Lemma 2 we have used fair sharing and continuity assump- 
tions. It would be of interest to investigate whether one of them or both can be relaxed. 

^ http : // www.statslab . cam. ac . uk/ ~jm288/ 
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10 20 30 40 50 

M 

Figure 3: Illustration of the log-term coefficients of the makespan from Table El the decentralized 
List scenario (solid) and the idealized centralized scenario (dashed). 

It would be interesting to generalize our results to account for a dynamic setting with peers 
arriving and perhaps leaving when they have completed the download of the file. In Internet 
applications users often connect for only relatively short times. Work in this direction, using 
a fluid model to study the steady-state performance, is pursued in [HJ and there is other 
relevant work in j37j . 

Also of interest would be to extend our model to consider users who prefer to free-ride and 
do not wish to contribute uploading effort. Or, to users who might want to leave the system 
once they have downloaded the whole file, a behaviour sometimes referred to as easy-riding. 
The Bit Torrent protocol, for example, implements a choking algorithm to limit free-riding. 

In another scenario it might be appropriate to assume that users push messages rather 
than pull them. See for an investigation of the design space for distributed information 
systems. The push-pull distinction is also part of their classification. In a push system, the 
centralized case would remain the same. However, we expect the decentralized case to be 
different. There are a number of other interesting questions which could be investigated in 
this context. For example, what happens if only a subset of the users is actually interested 
in the file, but the uploaders do not know which. 

From a mathematical point of view it would also be interesting to consider additional 
download constraints explicitly as part of the model, in particular when up- and download 
capacities are all different and not positively correlated. We might suppose that user i can 
upload at a rate Cj and simultaneously download at rate -Dj. 

More generally, one might want to assume different capacities for all links between pairs. 
Or, phrased in terms of transmission times, let us assume that for a file to be sent from user 
i to user j it takes time tij. Then we obtain a transportation network, where instead of link 
costs we now have link delays. This problem can be phrased as a one-to-all shortest path 
problem if Cj is at least A^+l. This suggests that there might be some relation which could be 
exploited. On the other hand, the problem is sufficiently different so that greedy algorithms, 
induction on nodes and Dynamic Programming do not appear to work. Background on these 
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can be found in and For M = 1, Priifer's {N + labelled trees [S] together with 

the obvious 0{N) algorithm for the optimal scheduling given a tree is an exhaustive search. 
A Branch and Bound algorithm can be formulated. 
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